GYM: A Multiround Join Algorithm In MapReduce And Its Analysis

نویسندگان

  • Foto Afrati
  • Manas Joglekar
  • Christopher Re
  • Semih Salihoglu
  • Jeffrey Ullman
چکیده

We study the problem of computing the join of n relations in multiple rounds of MapReduce. We introduce a distributed and generalized version of Yannakakis’s algorithm, called GYM. GYM takes as input any generalized hypertree decomposition (GHD) of a query of width w and depth d, and computes the query in O(d+log(n)) rounds andO(n (IN +OUT) M ) communication cost, where M is the memory available per machine in the cluster and IN and OUT are the sizes of input and output of the query, respectively. M is assumed to be IN 1 , for some constant > 1. Using GYM we achieve two main results: (1) Every width-w query can be computed in O(n) rounds of MapReduce with O(n (IN +OUT)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GYM: A Multiround Join Algorithm In MapReduce

Multiround algorithms are now commonly used in distributed data processing systems, yet the extent to which algorithms can benefit from running more rounds is not well understood. This paper answers this question for a spectrum of rounds for the problem of computing the equijoin of n relations. Specifically, given any query Q with width w, intersection width iw, input size IN, output size OUT, ...

متن کامل

GYM: A Multiround Distributed Join Algorithm

Multiround algorithms are now commonly used in distributed data processing systems, yet the extent to which algorithms can benefit from running more rounds is not well understood. This paper answers this question for several rounds for the problem of computing the equijoin of n relations. Given any query Q with width w, intersection width iw, input size IN, output size OUT, and a cluster of mac...

متن کامل

Cost Based Multi-Way Equi-Join Optimization in MapReduce

MapReduce is a prominent programming model above shared nothing architecture for processing big data with a parallel, distributed algorithm on a cluster. Join is an important operation is very inefficient in MapReduce. In this work, a time cost based evolution model is proposed for multi-way join by considering the time cost calculation. A multi-way join consists of start pattern joins and chai...

متن کامل

How Reduce Side Join Part File Expressions Equal MapReduce Structure into Task Consequences, Performance?

An intention of MapReduce Sets for Reduce side join part file expressions analysis has to suggest criteria how Reduce side join part file expressions in Reduce side join part file data can be defined in a meaningful way and how they should be compared. Similitude based MapReduce Sets for Reduce side join part file Expression Analysis and MapReduce Sets for Assignment is expected to adhere to fu...

متن کامل

Generalized Parallel Join Algorithms and Designing Cost Models

Applications for large-scale data analysis use such techniques as parallel DBMS, MapReduce (MR) paradigm, and columnar storage. In this paper we focus in a MapReduce environment. The aim of this work is to compare the different join algorithms and designing cost models for further use in the query optimizer.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015